1 research outputs found

    Design and Deployment of an Access Control Module for Data Lakes

    Get PDF
    Nowadays big data is considered an extremely valued asset for companies, which are discovering new avenues to use it for their business profit. However, an organization’s ability to effectively extract valuable information from data is based on its knowledge management infrastructure. Thus, most organizations are transitioning from data warehouse (DW) storages to data lake (DL) infrastructures, from which further insights are derived. The present work is carried out as part of a cybersecurity project in a financial institution that manages vast volumes and variety of data that is kept in a data lake. Although DL is presented as the answer to the current big data scenario, this infrastructure presents certain flaws on authentication and access control. Preceding work on DL access control points out that the main goal is to avoid fraudulent behaviors derived from user’s access, such as secondary use1, that could result in business data being exposed to third parties. To overcome the risk, traditional mechanisms attempt to identify these behaviors based on rules, however, they cannot reveal all different kinds of fraud because they only look for known patterns of misuse. The present work proposes a novel access control system for data lakes, assisted by Oracle’s database audit trail and based on anomaly detection mechanisms, that automatically looks for events that do not conform the normal or expected behavior. Thus, the overall aim of this project is to develop and deploy an automated system for identifying abnormal accesses to the DL, which can be separated into four subgoals: explore the different technologies that could be applied in the domain of anomaly detection, design the solution, deploy it, and evaluate the results. For the purpose, feature engineering is performed, and four different unsupervised ML models are built and evaluated. According to the quality of the results, the better model is finally productionalized with Docker. To conclude, although anomaly detection has been a lasting yet active research area for several decades, there are still some unique problem complexities and challenges that leave the way open for the proposed solution to be further improved.Doble Grado en Ingeniería Informática y Administración de Empresa
    corecore